... Rn G0/G1/G2... GN B0/B1/B2... BN format. In the implementation of PS and cs5, The r01_b0a0/r1g1b1a1/r2g2b2a2... Rngnbnan arrangement with 4 channels. Therefore, for computing purposes alone, cs4 has only one channel, and the amount of scalar computing and bandwidth consumption will be less than PS and cs5.Performance Comparison
The input here is 512x512, which compares forward FFT and reverse IFFT on the high-end, middle-end, and low-end graphics cards respectively.
NV
stream processor, in a way, the latest generation of mid-high-end graphics, this thing is the more the better. Identify the vertex processor is not the most, mainly look at the chip code card. For example, Nvidia GTX580, the chip code is GF110, stream processor is 512. and gtx660ti, Chip code is GK104, stream processor is 1344. These things, the graphics card box will generally have written, do not write or unscrupulous businessmen refused to say, do
. Due to the limitation of memory capacity at that time, the author used 2 GTX580 3GB RAM GPU parallel training, so the network is divided into two ways.Now our video card is enough to be able to do it all the way.Above the network:
The convolution cores of 5 convolutional layers are: 11*11*[email protected],5*5*[email protected],3*3*[email protected],3*3*[email protected],3*3*[email Protected], step order is 4,1,1,1,1, mode is Valid,same,sam
. Use a deeper and wider CNN to improve your learning capacity.3. Flexible use of relu as the activation function, the relative sigmoid greatly improved the training speed.4. Use multiple GPUs to increase the capacity of the model.5. The competition between neurons is introduced through LRN to help generalization and improve model performance.6. The partial neurons are randomly ignored by dropout to avoid overfitting.7. Avoid overfitting by means of data enhancement such as zooming, flipping, an
) x=Shared (Numpy.asarray (Rng.rand (Vlen), config.floatx)) F=function ([], T.exp (x))Print(F.maker.fgraph.toposort ()) T0=time.time () forIinchxrange (iters): R=f () T1=time.time ()Print("looping%d times took%f seconds"% (iters, T1-t0))Print("Result is%s"%(R,))ifNumpy.any ([Isinstance (X.op, T.elemwise) forXinchF.maker.fgraph.toposort ()]):Print('used the CPU')Else: Print('used the GPU')View CodeSave the above code as check_gpu.py, use the following command to test, according to the test res
Reprinted please indicate the source for the klayge game engine, this article address for http://www.klayge.org/2012/04/12/%e5%9f%ba%e4%ba%8epixel-shader%e7%9a%84fft%e5%b7%b2%e7%bb%8f%e5% AE %8c%e6%88%90/
The gpu fft in GPU gems 2 was implemented in klayge last weekend. After optimization and adjustment, it has entered the klayge development version last night. The complete FFT lens effects will soon be integrated.
The method 1 mentioned in the article is used here, because after testing, meth
Contact Us
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.